Picture for Dongbin Zhao

Dongbin Zhao

Are Full Rollouts Necessary for On-Policy Distillation?

Add code
Jun 01, 2026
Viaarxiv icon

Reinforcement Learning for Laser Additive Manufacturing Scan-Order Optimisation: A Bilevel Proxy--FEA Diagnostic Framework for Reward and World-Model Diagnosis

Add code
May 24, 2026
Viaarxiv icon

X-DiffVLA: X-Embodied Diffusion Action Heads for Vision-Language-Action Models

Add code
May 24, 2026
Viaarxiv icon

$π$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data

Add code
Apr 15, 2026
Viaarxiv icon

Saliency-Guided Representation with Consistency Policy Learning for Visual Unsupervised Reinforcement Learning

Add code
Apr 07, 2026
Viaarxiv icon

Posterior Optimization with Clipped Objective for Bridging Efficiency and Stability in Generative Policy Learning

Add code
Apr 02, 2026
Viaarxiv icon

Dynamic Dual-Granularity Skill Bank for Agentic RL

Add code
Mar 30, 2026
Viaarxiv icon

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

Add code
Mar 26, 2026
Viaarxiv icon

Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving

Add code
Mar 25, 2026
Viaarxiv icon

DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving

Add code
Mar 25, 2026
Viaarxiv icon